{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0b692c73",
   "metadata": {},
   "source": [
    "# Milvus Vector Store"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1e7787c2",
   "metadata": {},
   "source": [
    "In this notebook we are going to show a quick demo of using the MilvusVectorStore. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "47264e32",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-02-10T12:20:23.988789Z",
     "start_time": "2023-02-10T12:20:23.967877Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/filiphaltmayer/miniconda3/envs/llama/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "# Uncomment to see debug logs\n",
    "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
    "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index import VectorStoreIndex, SimpleDirectoryReader, Document\n",
    "from llama_index.vector_stores import MilvusVectorStore\n",
    "from IPython.display import Markdown, display\n",
    "import textwrap"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f9b97a89",
   "metadata": {},
   "source": [
    "### Setup OpenAI\n",
    "Lets first begin by adding the openai api key. This will allow us to access openai for embeddings and to use chatgpt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0c9f4d21-145a-401e-95ff-ccb259e8ef84",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-02-10T12:20:24.908956Z",
     "start_time": "2023-02-10T12:20:24.537064Z"
    },
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "59ff935d",
   "metadata": {},
   "source": [
    "### Generate our data\n",
    "With our LLM set, lets start using the Milvus Index. As a first example, lets generate a document from the file found in the `paul_graham_essay/data` folder. In this folder there is a single essay from Paul Graham titled `What I Worked On`. To generate the documents we will use the SimpleDirectoryReader."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-02-10T12:20:30.175678Z",
     "start_time": "2023-02-10T12:20:30.172456Z"
    },
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: 05b6691b-d567-43a2-94e1-e9ca81cd4624 Document Hash: 77ae91ab542f3abb308c4d7c77c9bc4c9ad0ccd63144802b7cbe7e1bb3a4094e\n"
     ]
    }
   ],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"../data/paul_graham/\").load_data()\n",
    "print(\"Document ID:\", documents[0].doc_id, \"Document Hash:\", documents[0].doc_hash)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "dd270925",
   "metadata": {},
   "source": [
    "### Create an index across the data\n",
    "Now that we have a document, we can can create an index and insert the document. For the index we will use a GPTMilvusIndex. GPTMilvusIndex takes in a few arguments:\n",
    "\n",
    "- collection_name (str, optional): The name of the collection where data will be stored. Defaults to \"llamalection\".\n",
    "- index_params (dict, optional): The index parameters for Milvus, if none are provided an HNSW index will be used. Defaults to None.\n",
    "- search_params (dict, optional): The search parameters for a Milvus query. If none are provided, default params will be generated. Defaults to None.\n",
    "- dim (int, optional): The dimension of the embeddings. If it is not provided, collection creation will be done on first insert. Defaults to None.\n",
    "- host (str, optional): The host address of Milvus. Defaults to \"localhost\".\n",
    "- port (int, optional): The port of Milvus. Defaults to 19530.\n",
    "- user (str, optional): The username for RBAC. Defaults to \"\".\n",
    "- password (str, optional): The password for RBAC. Defaults to \"\".\n",
    "- use_secure (bool, optional): Use https. Defaults to False.\n",
    "- overwrite (bool, optional): Whether to overwrite existing collection with same name. Defaults to False.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ba1558b3",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-02-10T12:20:33.735897Z",
     "start_time": "2023-02-10T12:20:30.404245Z"
    },
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [],
   "source": [
    "# Create an index over the documnts\n",
    "from llama_index.storage.storage_context import StorageContext\n",
    "\n",
    "\n",
    "vector_store = MilvusVectorStore(overwrite=True)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "04304299-fc3e-40a0-8600-f50c3292767e",
   "metadata": {},
   "source": [
    "### Query the data\n",
    "Now that we have our document stored in the index, we can ask questions against the index. The index will use the data stored in itself as the knowledge base for chatgpt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "35369eda",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-02-10T12:20:51.328762Z",
     "start_time": "2023-02-10T12:20:33.822688Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The author learned that the AI programs of the time were not capable of understanding natural\n",
      "language, and that the field of AI was a hoax. He also learned that he could make art, and that he\n",
      "could pass the entrance exam for the Accademia di Belli Arti in Florence. He also learned Lisp\n",
      "hacking and wrote his dissertation on applications of continuations.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author learn?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "99212d33",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-02-10T12:21:10.337294Z",
     "start_time": "2023-02-10T12:20:51.338718Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " A hard moment for the author was when he realized that the AI programs of the time were a hoax and\n",
      "that there was an unbridgeable gap between what they could do and actually understanding natural\n",
      "language.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What was a hard moment for the author?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "64cc925b",
   "metadata": {},
   "source": [
    "This next test shows that overwriting removes the previous data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8d641e24",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Res: \n",
      "The author is unknown.\n"
     ]
    }
   ],
   "source": [
    "vector_store = MilvusVectorStore(overwrite=True)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    [Document(text=\"The number that is being searched for is ten.\")], storage_context\n",
    ")\n",
    "query_engine = index.as_query_engine()\n",
    "res = query_engine.query(\"Who is the author?\")\n",
    "print(\"Res:\", res)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d8123529",
   "metadata": {},
   "source": [
    "The next test shows adding additional data to an already existing  index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a5c429a4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Res: \n",
      "The number is ten.\n"
     ]
    }
   ],
   "source": [
    "del index, vector_store, storage_context, query_engine\n",
    "\n",
    "vector_store = MilvusVectorStore(overwrite=False)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)\n",
    "query_engine = index.as_query_engine()\n",
    "res = query_engine.query(\"What is the number?\")\n",
    "print(\"Res:\", res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e5287c2d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Res: \n",
      "The author is Paul Graham.\n"
     ]
    }
   ],
   "source": [
    "res = query_engine.query(\"Who is the author?\")\n",
    "print(\"Res:\", res)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
